NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Wang, Zhendong; Li, Max; Mandlekar, Ajay; Xu, Zhenjia; Fan, Jiaojiao; Narang, Yashraj; Fan, Linxi; Zhu, Yuke; Balaji, Yogesh; Zhou, Mingyuan; et al (July 2025, International Conference on Machine Learning, 2025)

Free, publicly-accessible full text available July 1, 2026
Flow as the Cross-Domain Manipulation Interface

Xu, Mengda; Xu, Zhenjia; Xu, Yinghao; Chi, Cheng; Wetzstein, Gordon; Veloso, Manuela; Song, Shuran (October 2024, Conference of Robot Learning)

Full Text Available
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Chi, Cheng; Xu, Zhenjia; Pan, Chuer; Cousineau, Eric; Burchfiel, Benjamin; Feng, Siyuan; Tedrake, Russ; Song, Shuran (March 2024, Robotics: Science and Systems)

Full Text Available
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Ren, Allen Z; Dixit, Anushri; Bodrova, Alexandra; Singh, Sumeet; Tu, Stephen; Brown, Noah; Xu, Pen; Takayama, Leila Takayama; Xia, Fei; Varley, Jake; et al (November 2023, Conference on Robot Learning (CoRL))

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models.
more » « less
Full Text Available
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

Zhou, Xian; Zhu, Bo; Xu, Zhenjia; Tung, Hsiao-Yu; Torralba, Antonio; Fragkiadaki, Katerina; Gan, Chuang (March 2023, International Conference on Learning Representations 2023)
Universal Manipulation Policy Network for Articulated Objects

https://doi.org/10.1109/LRA.2022.3142397

Xu, Zhenjia; He, Zhanpeng; Song, Shuran (April 2022, IEEE Robotics and Automation Letters)

Full Text Available
BusyBot: Learning to Interact, Reason, and Plan in a BusyBoard Environment

Liu, Zeyi; Xu, Zhenjia; Song, Shuran (January 2022, 6th Conference on Robot Learning (CoRL 2022))

Full Text Available
AdaGrasp: Learning an Adaptive Gripper-Aware Grasping Policy

https://doi.org/10.1109/ICRA48506.2021.9560833

Xu, Zhenjia; Qi, Beichun; Agrawal, Shubham; Song, Shuran (May 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA))

This paper aims to improve robots’ versatility and adaptability by allowing them to use a large variety of end- effector tools and quickly adapt to new tools. We propose AdaGrasp, a method to learn a single grasping policy that generalizes to novel grippers. By training on a large collection of grippers, our algorithm is able to acquire generalizable knowledge of how different grippers should be used in various tasks. Given a visual observation of the scene and the gripper, AdaGrasp infers the possible grasp poses and their grasp scores by computing the cross convolution between the shape encodings of the gripper and scene. Intuitively, this cross convolution operation can be considered as an efficient way of exhaustively matching the scene geometry with gripper geometry under different grasp poses (i.e., translations and orientations), where a good "match" of 3D geometry will lead to a successful grasp. We validate our methods in both simulation and real- world environments. Our experiment shows that AdaGrasp significantly outperforms the existing multi-gripper grasping policy method, especially when handling cluttered environments and partial observations. Code and Data are available at https://adagrasp.cs.columbia.edu.
more » « less
Full Text Available
Learning 3D Dynamic Scene Representations for Robot Manipulation

Xu, Zhenjia; He, Zhanpeng; Wu, Jiajun; Song, Shuran (November 2020, Proceedings of the 2020 Conference on Robot Learning)

3D scene representation for robot manipulation should capture three key object properties: permanency - objects that become occluded over time continue to exist; amodal completeness - objects have 3D occupancy, even if only partial observations are available; spatiotemporal continuity - the movement of each object is continuous over space and time. In this paper, we introduce 3D Dynamic Scene Representation (DSR), a 3D volumetric scene representation that simultaneously discovers, tracks, reconstructs objects, and predicts their dynamics while capturing all three properties. We further propose DSR-Net, which learns to aggregate visual observations over multiple interactions to gradually build and refine DSR. Our model achieves state-of-the-art performance in modeling 3D scene dynamics with DSR on both simulated and real data. Combined with model predictive control, DSR-Net enables accurate planning in downstream robotic manipulation tasks such as planar pushing. Code and data are available at dsr-net.cs.columbia.edu.
more » « less
Full Text Available

Search for: All records